Genetic diversity of loquat (Eriobotrya japonica) revealed using RAD-Seq SNP markers

Loquat (Eriobotrya japonica) have originated in southeastern China and spread as a cultivated plant worldwide. Many of the loquat genetic resources collected internationally are of unknown origin, and their genetic background requires clarification. This study analyzed the genetic diversity of 95 accessions by using Rad-Seq SNP markers. Data analysis broadly classified loquat into three groups: (1) Japanese and Chinese cultivars and some Japanese strains (wild plants that are not used for commercial cultivation), (2) Vietnamese, Israeli, Greek, USA, and Mexican cultivars and strains, and (3) other Japanese strains. Group 2 is cultivated mostly outside of East Asia and was clearly distinct from the other groups, indicating that varieties of unknown origin with genetic backgrounds different from those of Japanese and Chinese cultivars may have been introduced to Mediterranean countries and North America. Because Japanese and Chinese cultivars belong to group 1, the current Japanese cultivars are derived from genetic resources brought from China. Some of group 1 may have been introduced to Japan before excellent varieties were developed in China, while group 3 may have been indigenous to Japan that have not been introduced by human activities, or may have been brought to Japan by human activities from China.

www.nature.com/scientificreports/ Although loquat has existed in Japan for more than 1000 years, the current Japanese cultivars were probably bred and spread from seeds introduced from China in the Edo period (1600-1868) 11 . Later, the number of cultivars increased through crossbreeding and bud sport mutations, mainly from 'Mogi' , 'Tanaka' , and 'Kusunoki' .
There are still many unknowns in the transmission of loquat, and elucidation of genetic relationships within this species is important and interesting in loquat breeding research. Loquat genetic diversity has been analyzed with SSR markers 13,14 and SNP markers 15 , but the number of markers was small. Genome-wide analysis is crucial for accurate evaluation of genetic diversity. Compared to these conventional methods, whole genome resequencing is the most effective method for studying genetic diversity. However, the cost has not decreased sufficiently to analyze a large number of individuals with whole genome resequencing. Restriction site-associated DNA sequencing (RAD-Seq) is an inexpensive method that can analyze a large number of individuals, and it can analyze a large number of markers compared to conventional methods. We have used RAD-Seq to study the genetic diversity of citrus 16,17 , firefly 18 , Japanese pepper 19 , and razor clam 20 , to construct the linkage map of bronze loquat 21 , and to study the phylogenetic relationships among Aurantioideae (citrus and its relatives) 22 . In this study, we used genome-wide SNP markers obtained by RAD-Seq to analyze the genetic diversity and structure of wild loquat strains from various parts of Japan; Japanese cultivars; and cultivars and strains introduced from China, Vietnam, Israel, Greece, the United States, and Mexico.

Results
Variant detection by mapping of RAD-Seq data. The double-digest RAD-Seq (ddRAD-Seq) of the 95 samples (Table 1) Table S1). The Stacks program constructed new loci with an average coverage depth of 14 times (Supplementary Table S2). The following analysis was performed using data from 1,822 variant sites.
Genetic structure. In principal component analysis (PCA), the contributions of the first and second principal components were 17.1% and 9.47%, respectively (Fig. 2). The first principal component separated the acces-  www.nature.com/scientificreports/ sions into three major groups: (1) Japanese and Chinese cultivars and some Japanese wild strains, (2) cultivars and strains from Vietnam, Israel, Greece, USA, and Mexico, and (3) other Japanese wild strains. The second principal component separated group 3 from the other groups. Wild strains from different parts of Japan were divided into two groups: some formed a separate group (group 3) and some belonged to group 1. Multidimensional scaling (MDS) analysis also divided the accessions into three major groups (Fig. 3) and supported the grouping by PCA. PCA and MDS analyses indicated the presence of subgroups. In group 1, most of the Japanese cultivars were in the upper right part, and the Chinese cultivars and some of the Japanese cultivars (22-24, 28, 38, and 48) were in the lower left part. In group 2, with some exceptions, country-specific subgroups could be discerned: Greek subgroup (86, 87, 89, and 90), Israeli subgroup (78, 79, 82, 84, and 85), and Vietnamese subgroup (72-77). Accessions from the USA and Mexico (80, 91 and 93-95) seem to be close to the Vietnamese subgroup. One of the US cultivars (60) belonged to group 1. www.nature.com/scientificreports/ Cluster analysis was performed using a pairwise distance matrix. The cluster analysis identified the two clades (A and B) (Fig. 4). Group 2 was placed in clade A, and group 1 and group 3 were placed in clade B.
Admixture analysis performed with the number of ancestral populations (K) ranging from 1 to 10 (Fig. 5, Supplementary Fig. S1) clearly divided the accessions into three groups, which strongly supports the PCA and MDS results. Reflecting the presence of subgroups as described above, the most likely number of ancestral The color of the sample number indicates the country or region where the sample was collected. Cyan represents germplasm resources and cultivars from China, darkgreen represents cultivars from Japan, orange represents germplasm resources from Japan, dark blue represents germplasm resources from Vietnam, purple represents cultivars from Israel, pink represents germplasm resources from Greece, black represents germplasm resources and cultivars from North America.   www.nature.com/scientificreports/  www.nature.com/scientificreports/  www.nature.com/scientificreports/ populations was 7 ( Supplementary Fig. S1). No gene flow from other groups was detected in group 3, which consisted of Japanese wild strains.
Population genetic statistics. Estimation of the degree of genetic diversity among the three groups using the F st values ( Table 2) showed that groups 1 and 3 were the genetically closest pair, as indicated by the lowest F st value, and groups 2 and 3 were the genetically most distant pair. The mean values of nucleotide diversity (π) and mean expected heterozygosity (He) were highest in group 2, followed by group 1 (Table 3, Supplementary  Table S3). The mean value of inbreeding coefficient (F is ) was lowest in group 3, whereas the F is values for groups 2 and 3 were similar to each other. The heterozygosity of each individual was calculated (Supplementary Table S4). Twenty-five individuals exhibited relatively low heterozygosity with a value below 0.15. All 11 individuals (61-71) in group 3 and 6 Japanese wild strains (52-54, and 57-59) in group 2 were in this low heterozygosity group.
Elucidation of the process of asexual reproduction. Several cultivars have been generated through asexual reproduction, as in the case of a bud sport mutation. To examine whether the parent cultivar and its possible child cultivar were derived from a single tree by a somatic mutation, we checked the conservation of heterozygosity using pairwise alignments (Table 4, Supplementary Fig. S2). The conservation ratio of heterozygous sites between technical replicates was above 90%, which is the criterion 17 for determining the combinations of the parent-child cultivars derived by asexual reproduction. Consistent with the available records 23 , 'Tanaka' (46) was a parent of 'Morimoto' (37). Heterozygous sites were conserved among ' Amakusagokuwase' (22), ' Amakusawase' (23), and 'Moriowase' (38). According to oral tradition, 'Moriowase' is the oldest, so ' Amakusagokuwase' and ' Amakusawase' are its descendants. Although the records 10 claim that 'Moriowase' (38) is a bud sport mutant of 'Mogi' (32), our analysis rejected this possibility because the conservation ratio of heterozygous sites between these two cultivars was only 19.7%.

Discussion
Our study analyzed the genetic diversity of 95 cultivars and strains of loquat collected from all over the world. On the basis of the analysis of the population structure of 19 Chinese loquat cultivars and strains by RAD-Seq, Hubei Province in China has been suggested as the center of origin of loquat cultivation 6 . However, the authors have not analyzed the movement of loquats following their introduction from China to other countries and have not verified whether the hypothesis that the cultivation of loquat started in a small area in China 24 is correct or  www.nature.com/scientificreports/ not. On the basis of the analysis of samples from around the world, we here propose and discuss a more complex history of loquat cultivation (Fig. 1B). Although genetic diversity analysis using SSR markers revealed no country-specific cultivar clusters 14 , the RAD-Seq analysis used in this study allowed for such clusters to be detected. This was possible because more markers are available in RAD-Seq analysis than in conventional SSR marker analysis. RAD-Seq analysis classified the loquat genetic resources into three groups: (1) Japanese and Chinese cultivars and some Japanese wild strains, (2) Vietnamese, Israeli, Greek, US, and Mexican cultivars and strains, and (3) other Japanese wild strains.
Group 2 had the highest mean values of π and He (Table 3). The F st values (Table 2) showed that group 2 is well separated from groups 1 and 3, which may be due to differences in their place of origin, as discussed below. The F is value of group 2 was higher than that of the Japanese wild strains (Table 3), which may reflect the fact that group 2 plants have been grown and crossbred by humans.
Group 2 was genetically separated from the Japanese and Chinese loquats. Blasco et al. 25 reported that SSR and S-allele markers can distinguish European cultivars from other cultivars. Our results are in good agreement with the above study. Morton 8 reported that loquat genetic resources were introduced to the West from China and Japan, but our study does not agree with this report. When plants are preserved in botanical gardens, records are well kept. However, for commercial use, records are not always kept, because the purpose of plant introduction is different. Group 2 plants may have been introduced to the West for commercial cultivation in a different way than they were introduced to botanical gardens.
The place of origin for group 2 may be in China or Japan, although our analysis failed to find any cultivars or strains in group 2 that originated from China or Japan. Unfortunately, the cultivars and strains from China analyzed in this study are not representative of all loquat cultivars from China, and we were unable to analyze cultivars and strains from southern China, such as Yunnan, Guangdong, and Guangxi. Wang et al. 6 reported that 'Younan' from Guangdong has a close genetic relationship with cultivars from Spain, Italy, and the USA. This suggests that genetic resources from particular regions of China may have been introduced to the West.
The fact that the Vietnamese strains belong to group 2 suggests two possibilities. One is that group 2 originated from Southeast Asian countries other than China, including Vietnam, in particular because cultivars from the USA and strains from Mexico were genetically similar to strains from Vietnam. Interestingly, when group 2 was analyzed in detail, we detected the presence of an Israeli subgroup and a Greek subgroup. They may have originated in Southeast Asian countries other than Vietnam. The second possibility is that cultivars or stains introduced from China and other countries to the USA or France (the former colonial master of Vietnam) were further introduced to Vietnam. Researchers who collected the strains in Vietnam told us that these strains were not used for edible fruit, even though they grew near houses. These plants may have been cultivated in Vietnam as ornamental trees or to be offered to Westerners. Thus, group 2 cultivars are likely to originate from various places.
The differences in taste preferences between the Greek, Israeli, American, and Mexican people and the Japanese and Chinese people may have influenced cultivar differentiation. Indeed, many cultivars and strains from Greece, Israel, USA, and Mexico have higher acid content than those from Japan and China 26,27 . However, differences in taste preferences need to be studied in detail in the future. Although group 2 strains grow in Vietnam, the Vietnamese most likely prefer to eat group 1 fruits because they do not eat group 2 fruits. In Vietnam, loquat imported from East Asia is commercially available.
In group 1, the mean values of π and He were lower than in group 2 but higher than in group 3 ( Table 3). The F st values indicated that group 1 is genetically closer to group 3 than to group 2 ( Table 2). The F is value of group 1 was higher than that of the Japanese wild strains (Table 3), which may reflect the fact that group 1 plants, like group 2 plants, have been grown and crossbred by humans.
The PCA and MDS analyses placed Japanese and Chinese cultivars in the center of the group 1 cluster. The present Japanese cultivars originated from excellent Chinese cultivars introduced at the Edo period (1600-1868) and were further improved in Japan 11 . The detection of group 1 reflects this proposed history. The cultivars were believed to have been improved through crossbreeding and bud sport mutations, using mainly 'Mogi' , 'Tanaka' , and 'Kusunoki' . However, some samples (22-24, 28, 38, and 48) were not related to these three cultivars. Other cultivars introduced at the Edo period or earlier or gene flow from Japanese wild strains may have contributed to the formation of these six cultivars.
With the exception of these six samples, the central part of the cluster in group 1 was divided into a subcluster of Japanese cultivars and a subcluster of Chinese cultivars. This separation may reflect differences in the breeding process between China and Japan, where mainly 'Mogi' , 'Tanaka' , and 'Kusunoki' were used. The slight difference in taste preferences between the Japanese and Chinese may have influenced the formation of the two subclusters, but further research is needed to address this issue.
The PCA and MDS analyses placed the Japanese wild strains at the periphery of group 1. Because group 1 contains Chinese cultivars, the wild strains in this group are closely related to plants introduced to Japan from China. These wild strains may be related to plants introduced before the development of excellent cultivars in China, as well as being related to plants described in ancient Japanese documents. Although other possibilities exist for the five wild strains (52, 53, and 55-57) as discussed below, two strains (54 and 58) are possible descendants of plants introduced to Japan before the Edo period. Analysis of wild strains in China could help to solve this problem, but unfortunately we were unable to analyze them. Among these wild samples, sample 59 was closer to cultivars such as Mogi (32-36, 43, 44, 47), which may be related to cultivar escape.
PCA and MDS analyses placed the Japanese and Chinese cultivars and the Japanese wild strains at the bottom of the cluster in group 1. Admixture analysis suggested that gene flow from isolated Japanese wild strains in group 3 may have occurred in these plants. The finding that gene flow from group 3 may have occurred in Chinese cultivars suggests that plants belonging to group 3 may be present in China. Alternatively, these Chinese cultivars may have been placed here under the influence of populations not analyzed in this study, rather than group 3. Gene flow from group 3 may have occurred to Japanese cultivars ( www.nature.com/scientificreports/ 53, and 55-57). It is interesting that genetic resources belonging to group 3 are not suitable for edible use and do not have useful traits that can be used to develop new varieties, such as disease resistance. There are two possibilities for the origin of group 3 plants growing in Japan. The first possibility is that these plants are indigenous to Japan that have not been introduced by human activities. Genetic diversity analysis using SSR markers has demonstrated that wild strains and cultivars in Japan are genetically different 14 , and our data support this conclusion. Group 3 plants growing in Japan had the lowest mean values of π and He (Table 3). The F st values indicated that group 3 was genetically closer to group 1 than to group 2 ( Table 2). A notable feature of group 3 was that the lowest F is value ( Table 3), suggesting that humans may not have played a role in the formation of this group. Admixture analysis detected no gene flow from other groups to group 3, even though groups 1 and 3 were genetically close, suggesting that the ancestors of group 3 were not introduced by humans from China, but were indigenous to Japan. Common plants often grow in laurel forest ecoregions in East Asia. Loquat is found in these ecoregions, and may have been growing in Japan since prehistoric times. If these plants are indigenous to Japan, the problem to be elucidated in the future is whether plants belonging to this group are present in China. The second possibility is that the group 3 plants growing in Japan were brought to Japan by human activities from China over the last few thousand years. Although modern people do not use group 3 plants for edible purposes, we cannot rule out the possibility that people in the past used them for edible purposes. In order to clarify these issues, it would be desirable to test and analyze individuals from southern China, such as those from Yunnan, Guangdong, and Guangxi provinces, as well as from the northern parts of Southeast Asian countries.
In this study, we also examined the contribution of breeding through asexual reproduction, as in the case of bud sport mutations (Table 4). We confirmed the available records of the asexual emergence of 'Morimoto' from 'Tanaka' . We also found that 'Moriowase' , ' Amakusawase' , and ' Amakusagokuwase' were asexually propagated from a single tree. On the other hand, we rejected the possibility that 'Moriowase' originated asexually from 'Mogi' . The above cultivars determined to have been born by asexual reproduction were very similar in fruit traits to their parent cultivars. Thus, our DNA-level analysis allowed to clarify the origin of some cultivars.
This study demonstrates that RAD-Seq analysis is applicable to the genome analysis of loquat, which has relatively low genetic diversity 14 . The information obtained here can be used for loquat cultivar identification and DNA profiling, and in genetic research and breeding programs.

Methods
Plant materials. The 95 cultivars and strains examined in this study are listed in Table 1. Among them, 24 are Japanese cultivars, including 11 produced by crossbreeding. The Japanese cultivar 'Morimoto' is probably bud sport mutant of 'Tanaka' , and 'Moriowase' is probably bud sport mutant of 'Mogi' . We also examined 21 cultivars from China, 19 Japanese wild strains, 6 strains from Vietnam, 8 cultivars from Israel, 5 strains from Greece, 3 cultivars from the United States of America, and 3 strains from Mexico. All plants were grown at the Nagasaki Prefectural Agricultural and Forestry Technology Development Center, Nagasaki, Japan. These genetic resources were introduced to Japan before 1997. The collectors took the permit, which was required at the time, and obtained the owner's permission. Samples 33-36 were technical replicates of sample 32, and samples 40 and 44 were technical replicates of samples 39 and 43, respectively. DNA extraction and double-digest restriction site-associated DNA sequencing. Genomic DNA was extracted by the CTAB method 28 , treated with RNase and then with phenol/chloroform. The concentration of DNA was measured with a Qubit dsDNA BR Assay Kit (Invitrogen, MA, USA) and adjusted to 20 ng/ µl. Libraries for ddRAD-Seq were prepared using the method of Sakaguchi et al. 29 , which is a modification of the original ddRAD-Seq method 30  Analysis of ddRAD-Seq data. Adapter sequences and low-quality bases were trimmed using fastp v 0.23.0 with default parameters 31 . The genome sequence of a Chinese cultivar 'Seventh star' was used as the reference 32 . The BWA-MEM algorithm of Burrows-Wheeler Aligner (version 0.7.17 r1188) 33 was used for mapping to the reference genome sequence. The ref_map.pl script implemented in the Stacks package (version 2.55) 34 was used for SNP identification and genotyping (with-X "populations: − R 0.7-ordered-export-vcf-plink-phylipphylip-var-write-single-snp-min-maf 0.05"). For the PCA and MDS analyses, genotyping data were created to specify all the samples by assigning one individual per population. Each Obs_Het value from "poples.sum-stats_summary.tsv" was used for the heterozygosity of each individual. This program was also used to create pairwise alignments between two individuals by changing the -R option to one.
For statistical analysis, groups separated by PCA and MDS analyses were considered as separate populations in the population map data required for the Stacks package. The denovo_map.pl script was re-performed after modifying the data for the population map (with − r 0.7-X "populations: -k-ordered-export-vcf-plinkphylip-phylip-var-write-single-snp-min-maf 0.05").

Principal component analysis, multidimensional scaling analyses and cluster analysis. The
SNPRelate package 35 in the R software environment (version 4.1.1) and the vcf file generated by the denovo_ map.pl script were used for PCA and MDS analyses. The program converted the vcf file into a gds (genomic data structure) file and created a PCA diagram. Then, the contribution of each principal component was calculated. At this step, only bi-allelic loci were used. The program also used the gds file to create an MDS diagram. The resulting images were generated with the basic functions of the R software environment. The SNPRelate program plotted the dendrogram by considering the identity by state (IBS) pairwise distance. Images of the results were generated using the basic functions of R software environment.